Goto

Collaborating Authors

 cross-lingual language model pretraining


Cross-lingual Language Model Pretraining

Neural Information Processing Systems

Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining. We propose two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingual language model objective. We obtain state-of-the-art results on cross-lingual classification, unsupervised and supervised machine translation. On XNLI, our approach pushes the state of the art by an absolute gain of 4.9% accuracy. On unsupervised machine translation, we obtain 34.3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU. On supervised machine translation, we obtain a new state of the art of 38.5 BLEU on WMT'16 Romanian-English, outperforming the previous best approach by more than 4 BLEU. Our code and pretrained models will be made publicly available.


Reviews: Cross-lingual Language Model Pretraining

Neural Information Processing Systems

This paper uses three techniques for incorporating multi-lingual (rather than just mono-lingual) information for pretraining contextualised representations: (i) autoregressive language modelling objective (e.g. The methods are evaluated on four tasks: (i) cross-lingual classification (XNLI), (ii) unsupervised machine translation, (iii) supervised machine translation, and (iv) low-resourcce language modelling. These results are important as they showcase the strong benefit of multi-lingual (rather than just mono-lingual) pretraining for multiple important downstream tasks, and achieve new state of the art. Originality: while the methods are not particularly novel (autoregressive and masked language modelling pretraining have both been used before for ELMo and BERT; this work extends these objectives to the multi-lingual case), the performance gains on all four tasks are still very impressive. The empirical results are strong, and the methodology is sound and explained in sufficient technical details. - Clarity: The paper is well-written, makes the connections with the relevant earlier work, and includes important details that can facilitate reproducibility (e.g. the learning rate, number of layers, etc.).


Reviews: Cross-lingual Language Model Pretraining

Neural Information Processing Systems

This paper studies the problem of cross lingual language model pretraining. Pros • An important problem is studied. Cons • The proposed methods are not particularly novel. All the reviewers liked the paper.

  cross-lingual language model pretraining
  Genre: Overview (0.74)

Cross-lingual Language Model Pretraining

Neural Information Processing Systems

Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining. We propose two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingual language model objective. We obtain state-of-the-art results on cross-lingual classification, unsupervised and supervised machine translation. On XNLI, our approach pushes the state of the art by an absolute gain of 4.9% accuracy.


Cross-lingual Language Model Pretraining

CONNEAU, Alexis, Lample, Guillaume

Neural Information Processing Systems

Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining. We propose two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingual language model objective. We obtain state-of-the-art results on cross-lingual classification, unsupervised and supervised machine translation. On XNLI, our approach pushes the state of the art by an absolute gain of 4.9% accuracy.